12 research outputs found

    Policy-Adaptive Estimator Selection for Off-Policy Evaluation

    Full text link
    Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-driven estimator selection problem is becoming increasingly important and can have a significant impact on the accuracy of OPE. However, identifying the most accurate estimator using only the logged data is quite challenging because the ground-truth estimation accuracy of estimators is generally unavailable. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies useful for the underlying estimator selection task. Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.Comment: accepted at AAAI'2

    Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

    Full text link
    Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.Comment: KDD2023 Research trac

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Full text link
    We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.Comment: This paper was accepted in NeurIPS 202

    Effectiveness of a digital device providing real-time visualized tooth brushing instructions: A randomized controlled trial

    Get PDF
    Introduction: The aim of this trial was to investigate whether a digital device that provides real-time visualized brushing instructions would contribute to the removal of dental plaque over usual brushing instructions. Methods: We conducted a single-center, parallel-group, stratified permuted block randomized control trial with 1:1 allocation ratio. Eligibility criteria included people aged ≥ 18 years, and exclude people who met the following criteria: severely crowded teeth; using interdental cleaning implement; having external injury in the oral cavity, or stomatitis; having less than 20 teeth; using orthodontic apparatus; visited to a dental clinic; having the possibility of consulting a dental clinic; having a dental license; not owning a smartphone or tablet device; smoker; taken antibiotics; pregnant; an allergy to the staining fluid; and employee of Sunstar Inc. All participants received tooth brushing instructions using video materials and were randomly assigned to one of two groups for four weeks: (1) an intervention group who used the digital device, providing real-time visualized instructions by connection with a mobile application; and (2) a control group that used a digital device which only collected their brushing logs. The primary outcome was the change in 6-point method plaque control record (PCR) score of all teeth between baseline and week 4. The t-test was used to compare the two groups in accordance with intention-to-treat principles. Results: Among 118 enrolled individuals, 112 participants were eligible for our analyses. The mean of PCR score at week 4 was 45.05% in the intervention group and 49.65% in the control group, and the change of PCR score from baseline was −20.46% in the intervention group and −15.77% in the control group (p = 0.088, 95% confidence interval −0.70–10.07). Conclusions: A digital device providing real-time visualized brushing instructions may be effective for the removal of dental plaque

    Prehospital cardiopulmonary resuscitation duration and neurological outcome after out-of-hospital cardiac arrest among children by location of arrest: a Nationwide cohort study

    Get PDF
    Background: Little is known about the associations between the duration of prehospital cardiopulmonary resuscitation (CPR) by emergency medical services (EMS) and outcomes among paediatric patients with out-of-hospital cardiac arrests (OHCAs). We investigated these associations and the optimal prehospital EMS CPR duration by the location of arrests. Methods: We included paediatric patients aged 0–17 years with OHCAs before EMS arrival who were transported to medical institutions after resuscitation by bystanders or EMS personnel. We excluded paediatric OHCA patients for whom CPR was not performed, who had cardiac arrest after EMS arrival, whose EMS CPR duration were  30 min) in both groups (1.4% [6/417] in residential locations and 0.6% [1/170] in public locations). Conclusions: A longer prehospital EMS CPR duration is independently associated with a lower proportion of patients with a favourable neurological outcome. The association between prehospital EMS CPR duration and neurological outcome differed significantly by location of arrests
    corecore